This notebook follows some of the examples in Visualization in Bayesian Workflow by Gabry, Simpson, Vehtari, Betancourt, and Gelman.
Our goal for this problem is to estimate ground-station values of \(PM_{2.5}\) (air concentrations of particulate matter of 2.5 microns or below), with uncertainty intervals, from satellite data. That is, we want to use a high-quality, reliable dataset that suffers from sparse and biased collection to calibrate a noisier dataset that we can collect anywhere. The plan:
library(tidyverse)
library(rstan)
library(tidybayes)
library(knitr)
# plot themes
theme_set(theme_minimal())
# stan options
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)
data <- read_csv(
"../../data/particulate_data.csv",
locale = locale(encoding="iso-8859-1"),
col_types = "cccicddddc"
)
# add a column to track city id
data <- data %>% mutate(city_id = row_number())
# show data
data
## # A tibble: 2,980 x 11
## City_locality iso3 country super_region super_region_na… pm25 sat_2014
## <chr> <chr> <chr> <int> <chr> <dbl> <dbl>
## 1 <NA> AUS Austra… 1 HighIncome 33.5 33.3
## 2 ´è_´è__lhavo PRT Portug… 1 HighIncome 15 9.87
## 3 ´è_´è__vila ESP Spain 1 HighIncome 10 9.18
## 4 AACHEN DEU Germany 1 HighIncome 13 15.5
## 5 AALBORG DNK Denmark 1 HighIncome 11 9.80
## 6 AALEN DEU Germany 1 HighIncome 12 19.3
## 7 AALESUND NOR Norway 1 HighIncome 7.5 4.73
## 8 AARSCHOT BEL Belgium 1 HighIncome 15 15.8
## 9 Aba NGA Nigeria 7 Sub-Saharan Afr 49 45.8
## 10 Abakaliki NGA Nigeria 7 Sub-Saharan Afr 28 52.2
## # ... with 2,970 more rows, and 4 more variables: log_pm25 <dbl>,
## # log_sat <dbl>, cluster_region <chr>, city_id <int>
data[,8:11] # just to show the rest of the columns
## # A tibble: 2,980 x 4
## log_pm25 log_sat cluster_region city_id
## <dbl> <dbl> <chr> <int>
## 1 3.51 3.51 2 1
## 2 2.71 2.29 2 2
## 3 2.30 2.22 2 3
## 4 2.56 2.74 2 4
## 5 2.40 2.28 2 5
## 6 2.48 2.96 2 6
## 7 2.01 1.55 2 7
## 8 2.71 2.76 2 8
## 9 3.89 3.82 4 9
## 10 3.33 3.96 4 10
## # ... with 2,970 more rows
Since we’re relating two variables, expect a lot of scatter plots.
Start with the logarithm of the ground-station measurements, plotted against the logarithm of the satellite measurements:
ggplot(data, aes(x = log_sat, y = log_pm25)) +
geom_point() +
geom_smooth(method = "lm")
If you squint, the linear regression line seems like an OK(ish) place to start; the data definitely has an overall linear correlation. The focus of the rest of this notebook will be on the extent to which deviations from this linear relationship are going to mess us up.
If we look closer, however, the errors don’t seem to be totally random (e.g. they’re not I.I.D. normal) - the data seems to have some structure to it, with some areas tending to have positive residuals and other areas negative.
We can visualize this easier by plotting the difference between log_pm25 and log_sat on the y-axis instead of just log_pm25 by itself:
ggplot(data, aes(x = log_sat, y = log_pm25 - log_sat)) +
geom_point() +
geom_smooth(method = "lm") +
geom_hline(yintercept = 0, color = "red", linetype = "dashed")
If the errors were IID we would expect them to be normally distributed around 0 (the red dashed line). But there is clearly a non-random pattern here. Let’s do some more exploration to see if we can (qualitatively) understand where this is coming from.
One possibility is that different types of locations have different relationships between satellite and ground station data. We could do a quick check by breaking out the above plot by WHO super-regions:
ggplot(data, aes(x = log_sat, y = log_pm25, color = super_region_name)) +
geom_point(alpha = 0.3) +
geom_smooth(method = "lm", se = FALSE) +
theme(legend.position="bottom")
Notice that the different geographic areas only partially overlap, and regression lines fit to each show very different relationships. Gabry et al also tried grouping stations using hierarchical clustering (labeled cluster_region in the dataset):
ggplot(data, aes(x = log_sat, y = log_pm25, color = cluster_region)) +
geom_point(alpha = 0.3) +
geom_smooth(method = "lm", se = FALSE) +
theme(legend.position="bottom")
Just as in the super-region plot, we see very different dependences here. When we start modeling, we’ll need to be on the lookout for biases in predictions on different types of locations.
To have something to benchmark against, let’s start by fitting a Bayesian linear regression,
\[ y_i \sim N(\beta_0 + \beta_1x_i, \sigma^2) \] where \(y_i\) is the ground station \(PM_{2.5}\) measurement, \(x_i\) is this satellite measurement, and \(\beta_0\), \(\beta_1\), and \(\sigma\) are the parameters we need to infer. We’ll use a normal distribution for priors on \(\beta_0\) and \(\beta_1\), and a half-normal for \(\sigma\) (since it has to be positive).
Before fitting anything, we should do a keep yourself honest check to see how well we’ve specified our model. Even without touching stan, it’s easy to draw fake data from the prior predictive distribution:
Let’s plot some of our prior predictive datasets alongside our actual data, starting with weakly informative priors:
prior_weak_file <- "../../stan/particulate_matter_ols_prior_weak.stan"
cat(read_file(prior_weak_file))
// ols model for particulate matter
// generate from the prior
data {
int<lower=1> N; // number of observations
vector[N] log_sat; // log of satellite measurements
}
parameters {
real beta0; // global intercept
real beta1; // global slope
real<lower=0> sigma; // error sd for Gaussian likelihood
}
model {
// priors
beta0 ~ normal(0, 100);
beta1 ~ normal(0, 100);
sigma ~ normal(0, 100);
}
generated quantities {
vector[N] log_pm_prior; // vector of data generated from prior
for (i in 1:N) {
log_pm_prior[i] = normal_rng(beta0 + beta1 * log_sat[i],sigma);
}
}
Compile this stan model.
prior_weak_model <- stan_model(prior_weak_file)
Now let’s sample from this model.
prior_data <- list(N = nrow(data), log_sat = data$log_sat)
prior_weak_samples <- sampling(prior_weak_model, data = prior_data, iter = 300, chains = 1)
##
## SAMPLING FOR MODEL 'particulate_matter_ols_prior_weak' NOW (CHAIN 1).
##
## Gradient evaluation took 5e-06 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 0.05 seconds.
## Adjust your expectations accordingly!
##
##
## Iteration: 1 / 300 [ 0%] (Warmup)
## Iteration: 30 / 300 [ 10%] (Warmup)
## Iteration: 60 / 300 [ 20%] (Warmup)
## Iteration: 90 / 300 [ 30%] (Warmup)
## Iteration: 120 / 300 [ 40%] (Warmup)
## Iteration: 150 / 300 [ 50%] (Warmup)
## Iteration: 151 / 300 [ 50%] (Sampling)
## Iteration: 180 / 300 [ 60%] (Sampling)
## Iteration: 210 / 300 [ 70%] (Sampling)
## Iteration: 240 / 300 [ 80%] (Sampling)
## Iteration: 270 / 300 [ 90%] (Sampling)
## Iteration: 300 / 300 [100%] (Sampling)
##
## Elapsed Time: 0.100519 seconds (Warm-up)
## 0.053731 seconds (Sampling)
## 0.15425 seconds (Total)
Extract the samples into a data frame using the tidybayes function gather_draws(). This produces some new columns that you may have not seen before:
.chain the chain number. Since we only ran one chain these will all be 1..iteration the iteration number within each chain..draw a unique number for each draw from the posterior.city_id the subscript for the variable (in this case for log_pm_prior). Since we are estimating 2980 observations this will vary from 1 to 2980.log_pm_prior the variable name.log_pm_weak <- prior_weak_samples %>%
spread_draws(log_pm_prior[city_id]) %>%
ungroup()
log_pm_weak
## # A tibble: 447,000 x 5
## .chain .iteration .draw city_id log_pm_prior
## <int> <int> <int> <int> <dbl>
## 1 1 1 1 1 341.
## 2 1 1 1 2 262.
## 3 1 1 1 3 290.
## 4 1 1 1 4 328.
## 5 1 1 1 5 324.
## 6 1 1 1 6 329.
## 7 1 1 1 7 255.
## 8 1 1 1 8 339.
## 9 1 1 1 9 369.
## 10 1 1 1 10 375.
## # ... with 446,990 more rows
Note the 447000 rows is a result of the 2980 observations * 1 chain * 300/2 iterations (1/2 are lost to burn in)
Let’s take a look at 9 random draws from our estimates. We’ll also add in the original data so we can compare against.
log_pm_weak_comparison <- log_pm_weak %>%
filter(.draw %in% sample(max(.draw), size = 9L)) %>%
left_join(data %>% select(city_id, log_sat, log_pm25), by = "city_id") %>%
select(.draw, city_id, log_sat, observed = log_pm25, sampled = log_pm_prior)
log_pm_weak_comparison
## # A tibble: 26,820 x 5
## .draw city_id log_sat observed sampled
## <int> <int> <dbl> <dbl> <dbl>
## 1 8 1 3.51 3.51 1029.
## 2 8 2 2.29 2.71 567.
## 3 8 3 2.22 2.30 450.
## 4 8 4 2.74 2.56 548.
## 5 8 5 2.28 2.40 609.
## 6 8 6 2.96 2.48 610.
## 7 8 7 1.55 2.01 429.
## 8 8 8 2.76 2.71 670.
## 9 8 9 3.82 3.89 772.
## 10 8 10 3.96 3.33 740.
## # ... with 26,810 more rows
Let’s visualize these results to see if our prior distribution is producing feasible values for log_pm25.
log_pm_weak_comparison %>%
gather("type", "log_pm_25", observed:sampled) %>%
ggplot(aes(x = log_sat, y = log_pm_25, color = type)) +
geom_point(alpha = 0.3) +
facet_wrap(~ .draw)
Some of these simulations include predictions orders of magnitude away from what we actually observe (remember we’re working on log scales here!)! In some cases the slope also goes the wrong way - these priors generate data that’s decidedly unphysical. This is bad for two reasons:
So what should we be looking for? From Gabry et al,
What do we need in our priors? This suggests we need containment: priors that keep us inside sensible parts of the parameter space.
So our priors should be broader than (and include) our data - but not be so broad that they make impossible predictions. Let’s try again with our intercept constrained to be near zero, slope biased toward being a small poisitive number (since we already know the measurements are roughly correlated) and noise confined to near the unit scale:
prior_file <- "../../stan/particulate_matter_ols_prior.stan"
cat(read_file(prior_file))
// ols model for particulate matter
// generate from the prior
data {
int<lower=1> N; // number of observations
vector[N] log_sat; // log of satellite measurements
}
parameters {
real beta0; // global intercept
real beta1; // global slope
real<lower=0> sigma; // error sd for Gaussian likelihood
}
model {
// priors
beta0 ~ normal(0,1);
beta1 ~ normal(1,1);
sigma ~ normal(0,1);
}
generated quantities {
vector[N] log_pm_prior; // vector of data generated from prior
for (i in 1:N) {
log_pm_prior[i] = normal_rng(beta0 + beta1 * log_sat[i],sigma);
}
}
Compile this stan model.
prior_model <- stan_model(prior_file)
Now let’s sample from this model.
prior_samples <- sampling(prior_model, data = prior_data, iter = 300, chains = 1)
##
## SAMPLING FOR MODEL 'particulate_matter_ols_prior' NOW (CHAIN 1).
##
## Gradient evaluation took 5e-06 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 0.05 seconds.
## Adjust your expectations accordingly!
##
##
## Iteration: 1 / 300 [ 0%] (Warmup)
## Iteration: 30 / 300 [ 10%] (Warmup)
## Iteration: 60 / 300 [ 20%] (Warmup)
## Iteration: 90 / 300 [ 30%] (Warmup)
## Iteration: 120 / 300 [ 40%] (Warmup)
## Iteration: 150 / 300 [ 50%] (Warmup)
## Iteration: 151 / 300 [ 50%] (Sampling)
## Iteration: 180 / 300 [ 60%] (Sampling)
## Iteration: 210 / 300 [ 70%] (Sampling)
## Iteration: 240 / 300 [ 80%] (Sampling)
## Iteration: 270 / 300 [ 90%] (Sampling)
## Iteration: 300 / 300 [100%] (Sampling)
##
## Elapsed Time: 0.054332 seconds (Warm-up)
## 0.054696 seconds (Sampling)
## 0.109028 seconds (Total)
log_pm_prior <- prior_samples %>%
spread_draws(log_pm_prior[city_id]) %>%
ungroup()
log_pm_prior
## # A tibble: 447,000 x 5
## .chain .iteration .draw city_id log_pm_prior
## <int> <int> <int> <int> <dbl>
## 1 1 1 1 1 -1.63
## 2 1 1 1 2 -1.36
## 3 1 1 1 3 -0.949
## 4 1 1 1 4 -1.57
## 5 1 1 1 5 -1.27
## 6 1 1 1 6 -2.06
## 7 1 1 1 7 -0.720
## 8 1 1 1 8 -1.89
## 9 1 1 1 9 -2.29
## 10 1 1 1 10 -2.63
## # ... with 446,990 more rows
Let’s take a look at 9 random draws from our estimates. We’ll also join in the original data (by city_id) so we can compare against.
log_pm_comparison <- log_pm_prior %>%
filter(.draw %in% sample(max(.draw), size = 9L)) %>%
left_join(data %>% select(city_id, log_sat, log_pm25), by = "city_id") %>%
select(.draw, city_id, log_sat, observed = log_pm25, sampled = log_pm_prior)
log_pm_comparison
## # A tibble: 26,820 x 5
## .draw city_id log_sat observed sampled
## <int> <int> <dbl> <dbl> <dbl>
## 1 3 1 3.51 3.51 9.90
## 2 3 2 2.29 2.71 6.10
## 3 3 3 2.22 2.30 7.53
## 4 3 4 2.74 2.56 6.00
## 5 3 5 2.28 2.40 7.23
## 6 3 6 2.96 2.48 9.58
## 7 3 7 1.55 2.01 3.72
## 8 3 8 2.76 2.71 8.68
## 9 3 9 3.82 3.89 8.44
## 10 3 10 3.96 3.33 8.48
## # ... with 26,810 more rows
Let’s visualize these results to see if our prior distribution is producing feasible values for log_pm25.
log_pm_comparison %>%
gather("type", "log_pm_25", observed:sampled) %>%
ggplot(aes(x = log_sat, y = log_pm_25, color = type)) +
geom_point(alpha = 0.3) +
facet_wrap(~ .draw)
Here’s Gabry et al’s code for the basic regression model:
model_file <- "../../stan/particulate_matter_ols.stan"
cat(read_file(model_file))
// ols model for particulate matter
data {
int<lower=1> N; // number of observations
vector[N] log_sat; // log of satellite measurements
vector[N] log_pm; // log of ground PM_2.5 measurements
}
parameters {
real beta0; // global intercept
real beta1; // global slope
real<lower=0> sigma; // error sd for Gaussian likelihood
}
model {
// priors
beta0 ~ normal(0,1);
beta1 ~ normal(1,1);
sigma ~ normal(0,1);
// likelihood
log_pm ~ normal(beta0 + beta1 * log_sat, sigma);
}
generated quantities {
vector[N] log_lik; // pointwise log-likelihood for LOO
vector[N] log_pm_rep; // replications from posterior predictive dist
for (n in 1:N) {
real log_pm_hat_n = beta0 + beta1 * log_sat[n];
log_lik[n] = normal_lpdf(log_pm[n] | log_pm_hat_n, sigma);
log_pm_rep[n] = normal_rng(log_pm_hat_n, sigma);
}
}
Compile this stan model.
model <- stan_model(model_file)
Now let’s sample from this model.
model_data <- list(N = nrow(data), log_sat = data$log_sat, log_pm = data$log_pm25)
model_samples <- sampling(model, data = model_data, iter = 1000, chains = 4)
And extract the draws for \(\beta_0\), \(\beta_1\), and \(\sigma\).
coef_samples <- model_samples %>%
spread_draws(beta0, beta1, sigma) %>%
ungroup()
coef_samples
## # A tibble: 2,000 x 6
## .chain .iteration .draw beta0 beta1 sigma
## <int> <int> <int> <dbl> <dbl> <dbl>
## 1 1 1 1 0.907 0.700 0.453
## 2 1 2 2 0.890 0.706 0.444
## 3 1 3 3 0.865 0.712 0.451
## 4 1 4 4 0.856 0.714 0.448
## 5 1 5 5 0.851 0.713 0.449
## 6 1 6 6 0.936 0.689 0.450
## 7 1 7 7 0.925 0.690 0.451
## 8 1 8 8 0.930 0.697 0.450
## 9 1 9 9 0.912 0.689 0.453
## 10 1 10 10 0.912 0.688 0.452
## # ... with 1,990 more rows
Gabry’s paper has some slick visualizations for investigating your MCMC run, that we’ll talk about in the future when we discuss Hamiltonian Monte Carlo. For now, just do some quick trace plots to ensure nothing terrible happened:
coef_samples %>%
gather("parameter", "value", beta0:sigma) %>%
ggplot(aes(x = .draw, y = value)) +
geom_line() +
facet_wrap(~parameter, ncol = 1, scales = "free_y")
Looks like we’re mixing OK.
Not surprisingly, the slope and intercept show some negative correlation:
ggplot(coef_samples, aes(x = beta0, y = beta1)) + geom_hex() + scale_fill_viridis_c()
ggplot(coef_samples, aes(x = beta0, y = sigma)) + geom_hex() + scale_fill_viridis_c()
ggplot(coef_samples, aes(x = beta1, y = sigma)) + geom_hex() + scale_fill_viridis_c()
Now that we’ve fit a model, we can draw data from the posterior predictive distribution \(p(y^\prime|y)\). We can do a couple types of things with this fake data to get some intuition for how our model is doing:
In either case we can break those checks out by subsets of the data (like WHO super region) to see if the model describes data equally well in different regions, or whether we’ll tend to see geographic biases in any estimates we make from the model.
Let’s extract the posterior predictive distribution samples:
log_pm_ppd <- model_samples %>%
spread_draws(log_pm_rep[city_id]) %>%
ungroup() %>%
select(-.chain, -.iteration)
log_pm_ppd
## # A tibble: 5,960,000 x 3
## .draw city_id log_pm_rep
## <int> <int> <dbl>
## 1 1 1 2.33
## 2 2 1 3.34
## 3 3 1 4.16
## 4 4 1 2.60
## 5 5 1 3.37
## 6 6 1 3.36
## 7 7 1 2.79
## 8 8 1 3.15
## 9 9 1 4.52
## 10 10 1 3.13
## # ... with 5,959,990 more rows
Here’s a flipbook of scatter plots, comparing PPD datasets with the real. Again we sample from 9 random draws.
log_pm_ppd9 <- log_pm_ppd %>%
filter(.draw %in% sample(max(.draw), size = 9L)) %>%
left_join(data %>% select(city_id, log_sat, log_pm25), by = "city_id") %>%
select(.draw, city_id, log_sat, observed = log_pm25, sampled = log_pm_rep)
log_pm_ppd9
## # A tibble: 26,820 x 5
## .draw city_id log_sat observed sampled
## <int> <int> <dbl> <dbl> <dbl>
## 1 56 1 3.51 3.51 3.13
## 2 84 1 3.51 3.51 4.49
## 3 261 1 3.51 3.51 3.52
## 4 395 1 3.51 3.51 3.65
## 5 636 1 3.51 3.51 3.65
## 6 1047 1 3.51 3.51 3.66
## 7 1234 1 3.51 3.51 4.04
## 8 1313 1 3.51 3.51 3.58
## 9 1502 1 3.51 3.51 3.38
## 10 56 2 2.29 2.71 3.38
## # ... with 26,810 more rows
log_pm_ppd9 %>%
gather("type", "log_pm25", observed:sampled) %>%
ggplot(aes(x = log_sat, y = log_pm25, color = type)) +
geom_point(size = 0.5, alpha = 0.3) +
facet_wrap(~ .draw)
While they certainly overlap, (as expected) there’s some interesting structure in the real data that the PPD doesn’t capture. Breaking out by super region exacerbates the differences:
log_pm_ppd9 %>%
left_join(data %>% select(city_id, super_region_name), by = "city_id") %>%
gather("type", "log_pm_25", observed:sampled) %>%
ggplot(aes(x = log_sat, y = log_pm_25, color = type)) +
geom_point(alpha = 0.1, size = 0.1) +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(~ super_region_name)
If we compare the histogram of observed \(PM_{2.5}\) values with samples from the PPD we see that the real data has some asymmetry that the synthetic data is missing:
log_pm_ppd9 %>%
gather("type", "log_pm_25", observed:sampled) %>%
ggplot(aes(x = log_pm_25, group = str_c(.draw, type))) +
geom_freqpoly(aes(color = type), bins = 20)
A danger with posterior predictive checking is that we’re double dipping with our data - using it once to fit the model and a second time to check it. When we’re comparing summary statistics, a poor choice of statistic could obscure a serious problem with the model.
Consider, for example, a model where one parameter controls the mean of the posterior distribution. Comparing means of samples from the PPD will probably look good (because that parameter can learn the mean of your data easily) even if the shape of the fit is terrible. It’s good practice, then, to select test statistics that aren’t directly related to a single model parameter.
In this case - since our histogram showed that the real data is much less symmetric than the posterior predictive distribution, let’s use the skewness (a measure of distribution asymmetry):
skew <- function(x) {
mu <- mean(x)
numerator <- mean((x - mu)^3)
denominator <- mean((x - mu)^2)^(3/2)
numerator/denominator
}
observed_skewness <- skew(data$log_pm25)
observed_skewness
## [1] 0.5562795
log_pm_ppd %>%
group_by(.draw) %>%
summarize(sk = skew(log_pm_rep)) %>%
ggplot(aes(x = sk)) +
geom_histogram(bins = 30) +
geom_vline(xintercept = observed_skewness, linetype = "dashed")
Breaking out a statistic (let’s try median this time) by WHO super region, we’ll see that not only do the samples not resemble the observed data, but they’re wrong in different directions for different regions! If we were answering questions that compared estimates for different geographical regions, this would be a first step to getting really dumb answers.
ppd_median_by_region <- log_pm_ppd %>%
left_join(data %>% select(city_id, super_region_name), by = "city_id") %>%
group_by(super_region_name, .draw) %>%
summarize(med = median(log_pm_rep)) %>%
ungroup()
data_median_by_region <- data %>%
group_by(super_region_name) %>%
summarize(med = median(log_pm25))
ggplot() +
geom_histogram(aes(x = med), data = ppd_median_by_region, bins = 30) +
geom_vline(aes(xintercept = med), linetype = "dashed", data = data_median_by_region) +
facet_wrap(~ super_region_name)
What’s next? By this point we should all agree that this problem deserves a model more nuanced than simple linear regression. Build a better model and run some of the same tests to see how it performs!
Also: hierarchical model, ripped off from Gabry’s Github:
hier_file <- "../../stan/particulate_matter_multilevel.stan"
cat(read_file(hier_file))
// multilevel model for particulate matter
// assume non-centered data
data {
int<lower=1> N; // number of observations
int<lower=1> R; // number of super regions
int<lower=1,upper=R> region[N]; // region IDs
vector[N] log_sat; // log of satellite measurements
vector[N] log_pm; // log of ground PM_2.5 measurements
}
parameters {
real beta0; // global intercept
real beta1; // global slope
vector[R] beta0_region_raw; // 'raw' region intercept offsets for NCP
vector[R] beta1_region_raw; // 'raw' region slope offsets for NCP
real<lower=0> tau0; // sd of beta0_region
real<lower=0> tau1; // sd of beta1_region
real<lower=0> sigma; // error sd for Gaussian likelihood
}
model {
// mean of likelihood
vector[N] mu;
// priors
beta0 ~ normal(0,1);
beta1 ~ normal(1,1);
tau0 ~ normal(0,1);
tau1 ~ normal(0,1);
beta0_region_raw ~ normal(0,1);
beta1_region_raw ~ normal(0,1);
sigma ~ normal(0,1);
// construct mean of each observation
// each region has a unique intercept and slope...
// ...so we iterate through the observations and
// lookup the betaX_region that matches the region
// of observation i
for (i in 1:N) {
mu[i] = (beta0 + beta0_region_raw[region[i]]) + (beta1 + beta1_region_raw[region[i]]) * log_sat[i];
}
// likelihood
log_pm ~ normal(mu, sigma);
}
generated quantities {
vector[N] log_lik; // pointwise log-likelihood for LOO
vector[N] log_pm_rep; // replications from posterior predictive dist
for (i in 1:N) {
// what's the mean for observation i?
real log_pm_hat_n;
log_pm_hat_n = (beta0 + beta0_region_raw[region[i]]) + (beta1 + beta1_region_raw[region[i]]) * log_sat[i];
// now compute log likelihood and simulate from posterior
log_lik[i] = normal_lpdf(log_pm[i] | log_pm_hat_n, sigma);
log_pm_rep[i] = normal_rng(log_pm_hat_n, sigma);
}
}
Compile and sample values from the above hierarchical model. Perform the same posterior predictive checks we did for previous examples. Contrast the results to determine if this model is an improvement over the other ones.